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THE BACKGROUND AND EVOLUTION OP THE METHOD OP LEAST SQUARES 


Churchill Eisenhart 
National Bureau of Standards 
Washington, D. C* 

DIGEST* 


1, Introduc tlon 


The present status of the Method of Least Squares is this: Everyone uses it, but 
not in exactly the saine way, nor for the same reasons. There is thus some similarity 
to the present status of Probability, with respect to which Bertrand Russell has 
remarked (o9, p. 344]: ’’While interpretation in this field is controversial, the math- 
ematical calculus Itself commands the same measure of agreement as any other branch of 
mathematics," But the situation with respect to the Method of Least Squares is not 
exactly parallel: In the caso oi tho Method of Least Squares there is complete agree- 
ment on the procedure lor form ini; the normal equations from the fundamental observa- 
tional equatio ns; and everyone comes up with the very same numbers for the solutions oi 
these equations , but their reasons for employing the Method of Least Squares, their 
understanding ox its objectives and the conditions under which these are achieved, and 
their interpretations of end results of its application may be quite different. 
Furthermore, in contrast to the situation in the case of Probability, members of one 
"Least Squares School" are not generally aware of the existence of the others. 

This somewhat extraordinary situation results from the fact that the Method oi 
Least Squares was developed originally from three distinctly different points of view: 

(1) LEAST Sum of SQUARED RESIDUALS (Legendre, 1805), (2) MAXIMUM PROBABILITY of ZERO 
ERROR of Estimation (Gauss, 1809), and (3) LEAST Mean SQUARED ERROR of Estimation 
(Gauss, 1821). These differ not only in their aims and in their initial assumptions, 
but also in the meanings that they attach to the numbers that all three yield as a 
common answer to any given problem. The existence of these three different approaches 
to tho subject, and the consequent possibility of different interpretations oi the end 
results of applying the Method are rarely mentioned in books on the practical applica- 
tions of the Method of Least Squares. The only exception of which I am aware is 
Whittaker and Robinson's T he Calculus of Observations , first published in 1924, in 
chapter IX of which one finds discussions of Legendre * s approach, Gauss's first 
approach, and Gauss's second approach, which Gauss himself definitely prcferred( G3 ,p,224 ] 


2 t Ropeated Measurement of a Single Quantity and The Principle of the Arithmetic Mean 

All three developments of the Method of Least Squares stemmed from 

The Principle of the Arithmetic Mean: Given a number of measurements of a single 
quantity, made with the same care under the same circumstances, the best value of 
the unknown magnitude of this quantity afforded by the measurements in hand is 
the arithmetic mean of their values. 

The Principle of the Arithmetic Mean seems to have originated in western Europe 
sometime in the latter half of the 16th century A ,D. ; and appears to have evolved Irom 
the Method of Reversal for eliminating (or, at least, reducing) the effects oi 
systematic errors, that is, from the technique of taking measurements in pairs such 
that tho two members of a pair are affected by systematic errors of (approximately) 
equal magnitude but of opposite signs, in which case tho arithmetic mean of the pair is 
(at loas^t, more nearly) free from the effects of these orrors. Indeed, the practice o* 
taking several measurements of a single quantity by the same method under essentially 
tho same circumstances— -a necessary precursor to taking tho arithmetic mean oi such 
me a sure men ts--seems to have originated early in the 16th century A.D. in connection 
with the efforts of mariners to devise a method for determining the longitude of a ship 


*Prepared lor distribution to participants ol tho 34th Session of the International 
Statistical Institute, Ottawa, Canada, 21-29 August 1963. 
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at bca «*br*orvatloiMi on Un» deviation of a cowpasH noodle lr<*»n tho trno north, 
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A.. tires* A 


1^ ! ilna that many |>orso*is today aio surprised to learn that tho prucllco ol taking 
several .noasujomcn ts ot‘ a single quantity by tho samo mothod under essentially the sa«c 
circumstances lor the purpose of increasing one's reliance lit tho resulting value 1$ a * lr 
recent development In rolatlon to tho whole history of science* In antiquity astronoay 
and physics were predominantly mathematical in character* Tho Babylonian and Greek 
astronomers of antiquity wero primarily mathematicians; the Sun, the Moon, and the 
Planets wero simply the objects of their mathematical analyses. Great attention was 
givon to mathematical details; much ingenuity was shown in the solution of the mathe- 
matical problems involved; the data used to determine numerical values for the param- 
eters in their mathematical formulae "were Instances or specimens, chiefly, but not 
necessarily, taken from observation with all Its uncertainties, not Intended as 
important new knowledge but often simple verifications, easily accepted, of respected 
earlier knowledge" (7b, p. 150), At best, numerical values for parameters were 
determined from single observations corresponding to spoclal or extreme circumstances 
that were considered to be Especially favorable or decisive* The general practice was 
to deduce a lot from very few data* "One can *** demonstrate that (Babylonian) tables 
(from tho period 240-40 B.C. ) for the phenomena of Jupiter, computed ahead for several 
decades, were based on a single observational element, the rest being derived thcrefrws 
in strictly mathematical fashion* This conforms to a conscious tendency of ancient 
astronomers to reduce the empirical data to the barest minimum, because they were well 
aware of the great insecurity of direct observation." (71, p. 801] "I do not know ol 
any case from antiquity of repeated observations of the same quantity. In principle 
that does not exclude that such observations were made; theoretical treatises like the 
Almages t (of Claudius Ptolemy, 2nd century A.D.J would not mention such things nor 
would tney show up in computed ephemer ivies, etc. But I think that it is very unlikely 
that one over repeated on purpose an observation in order to find a more secure value.*' 

( 70 ] * It was the need for definitive angers to such questions as * as the New Star ol 
1572 infinitely distant fiom the i-’arth like the other stars? Or was it closer than tae 
Moon? Or was it a comet?* that led Tycho Brahe (1546-1601) to initiate the practice oi 
taking repeated measurements of the relative positions of heavenly bodies (76, pp. 207 - 
208, 214). 

Prior to the 17th century A.D., astronomers and physicists alike selected from the 
available relevant observations those that seemed to them to be the best, in the scn^c 
of being in best agreement w.ith their own observations, accepted theory, or what soe.aof 
to be the nature of the phenomena* Claudius Ptolemy is notorious for his selection of 
da la to suit his own ends (73); and selection oJ data was the usual practice *>f 
John flamstcud (1649-1719), the lirst Astronomer floyal of Great Britain, according to 
his official biographer, Francis Daily (1774-1841); 

"JU^ mode of proceeding was diiforcnt from that adopted at tho present day. 
ifTi J?ii k ‘VPft car to ha \ o t at e n tho mean oi several obse r v a 1 1 o n s 1 1 >r a morn 

£° >re ! : ^ ij -gilL - wc > fad tnat, ’•/here more than one ofuToT va TYon of a 

.—Ji 4 cc 4 re duce d ,hc h as gen era fly rfssumed thn t r c sTiTT^fuTTh seem cTT to 
L 4 -- 1 * F ? 1 * n v regard to Die rest. Neither, 
in iact, did he reduce the wnole (nor anyThTng^Tike tTie whoTcTT>F his observa- 
tions; many days* work having been wholly omitted in his computation-book. 

And, moreover, many ol tho results, which have been actually computed in that 
book, have not been Inserted in any of his MS catalogues; eithor from 
inadvertence, or fr*mi some suspicion ojo their accuracy. The reader will find 
utany instances ol this kind adduced in the Notes. So that, in fact, tho 
British Catalogue, even corrected and enlarged as it now is, dees not present 
a ligidly coiroct and Juitd.twl resul t of the observation. 1 *; and can only be 
considered as an index &»id guide to those who may bo disposed to examine more 
minutely tho position ol any particular stars, or to search into any branch 

0 astronomy (such as the lunar or planetary motions) connected therewith, 
liuis, although there are 163 observations of q Gominorum, only 1 ui those 
observations have been reduced by Flamsteed; and he lias taken the result of 
tho Urst reduction as the correct value* Again, there are 124 observations 

01 y Gominorum, yet only 2 of those have boon reduced by Flamsteed; and he 
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has here also taken the result of tho first reduction as tho correct value 
although there is a difference of 3* between the two. These are not singular 
cases, but the samo method is pursued throughout the whole of his work. *He 
seems to have been more solicitous about increasing tho number of his stars 
in order to complete his catalogue, than anxious to rcAch those refinements 
in the art oi reduction which have rendered modern observations of so much 
value , but which wore neither known, nor oven suspected, at that period *' 

[30, p. 370. Underscoring aJdod.] 

In this connection it needs to be remembered that from the days of Pytha-oras 
(tMO + B.C.) until Johannes Kepler (1571-1030) it was a matter of firm belief that the 
heavenly bodies travel in circular orbits, and when observations apparently disagreed 
the problem was to "save the phenomena'*, as it was called, by devising appropriate 
corrections to the observations, or by discrediting them altogether— the theory was 
above question: ”To hold other views was not a scientific error but a heresy. 

Astronomy in antiquity was as thorny a subject as biblical criticism in modern times. 
Observational astronomy was subjected to anxious scrutiny and careful management" 

[82, P. 223 J. Kepler, respecting this tradition, labored lor nearly six years and 
through "at least seventy" separate calculations in his attempts to explain an 8’ dis- 
crepancy in the orbit o* Mars implied by Tycho Brahe's observations before recognis- 
ing, accop tin,., and then proving the ellipticity of its orbit. 


The evolution of "taking the mean" from implicit practice to explicit principle 

can be summarised as follows: 

1319 Before 131 f J, Francisco Faloro, a Portuguese in the service of the Spanish Navy, 
and Filipe Guillen, an apothecary of Seville, had devised sun-dial instruments 
equipped with na.^nctic needles, for determining true north at sea by takin; 
the mean of the azimuths ox tho gnomonic shadows corresponding to equal alti- 
tudes of the Sun before and after noon, and thence determining the declination 
of the magnetic needle by noting its deviation from the north so determined. 
[60, pp. Ul-33; G4 , p. 79] 

>33 Falero's treatise on navigation [1], published in 1535, contains the first 

printed description of the abovemontioned technique, together with an explicit 
recommendation that several measurements of the magnetic declination bo nade 
on a single day in the interest of greater assurance in the value so found, 

(l, para d; 64, p. 33] Ho says nothing on how to choose a "best value" when 
the values so found are nonidontical . 

1530-41 Joao de Castro, utilizing a Falero-Guillen type instrument improved by the 

addition oi a device for determining the Sun's altitude [2], recorded 4 3 sets 
of magnetic declination determinations taken on voyages from Lisbon (Portugal) 
to Goa (India), along the west coast of India, and in the Red Sea from India 
to Suez (?>], The two, three, or four values recorded (to 1/4°) for a single 
day are often identical, and differ at most by only 3/4°. [GO, p, 85; 67, p.197] 

1331 William Borough, Comptroller of the Navy of Queen Elizabeth, reports 9 values 
(to 1/2 min.) of the magnetic declination at Llmehouse obtained by himself on 
October 1G, 1580, using the Falero-Guillen technique and a similar instrument 
of his own design, "and conferring them altogether, I dost inefc the true varia- 
tion of the Needle on Compass at Limehouse to be about lid. 1/4 or Hi. 1/3, 
which is a point of the compasse just [360°/32 - 11 1/4°) or little more" [•;, 
chapter 3, last paragraph]. For these 9 values 1 find: 

mean « 11° 13 d/9’ - 11.32°; median - 11° 17 1/2' - 11.29° 

modo - 11° 22 1/2' - 11.37°; midrange - 11° 17' - 11.28° 

Clearly tho operation of "conferring them altogether" is not uncquivocaily 
defined by the outcome, but 1 consider it reasonable to conclude that ho 
actually took the arithmetic moan. 

^ 02 jWcnty -seven separate determinations of the right ascension of the star a 

Ariotis were made by the Danish nobleman, Tycho Brahe, founder of modern 
observational astronomy, during tho period February 1382 through December 1 
To eliminate tho effects of parallax and refraction, ho combined 21 oi these 
to form 12 pairs such that one determination of a pair was basod on observa- 
tion of Vonus west of the Sun, tho other on observation east of tho Sun, those 
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1622 


IG'b, 


obsorvu tlons lH)lnt; so wjIoc lo*l that tho altliudiis, declinations, anti dtg a 
tancoti from tho Earth of tho Sun amt Venus, respectively, woro as Tar as" 
possible the sumo in each of tho two Instances, From tho moans of those 12 
pairs (whoso time mid-points rain o from 1362 Doe. 0 to 1387 iloe, 22) and u* 
remaining 3 individual determinations (correspond ini* to Fob,, March, and * 
April 1562) ho adopts 2<» e 0* 30" as the value of tho right ascension of 
Arlotis at tho e nd ol (he year la*t3, i,o. the mid- point of tho period of 
observation, witlunft~ iur fhor explanation* [5, Part 1, chapter 2, article 
193; Opera, Vol* 11, p. 197; 56, p* 350; 81, p. 130-132). His data yield 

the following: 


mean 


median 

mode 

midrange 


time-mean 1 
time-midrange 


12 pairs 

12 pairs + 3 indiv 

26° O' 27” 

26® 0* 29” 

26® O' 26 1/2” 

26® 0 30* 

nono 

nono 

26® 0* 23” 

26® 0* 24” 

1886 Aug. 17 

1835 Dec. 2 

1884 Sept. 26 
1834 Jan. 24 


Inasmuch as tho coordinates of nino standard reference stars riven in 
his star catalogue are all rounded to 5", I conclude that Tycho evaluated the 
midrange of tho time mid-points ol his 12 paired values, and the arithmetic 
mean of the corresponding 12 mean right ascension determinations, rounding 
the resulting valuos to the noarost half-year and nearest 5**, respectively. 
This is certainly astute data analysis, but it is not exactly taking the 
arithmetic moan of several measurements of a single fixed quantity obtained 
by the same method under essentially the samo circumstances* It exemplifies 
a more general technique of which taking tho moan of several strictly compa- 
rable measurements iu tho limiting case as tho range of "circumstances” 
involved shrinks to zero* 


Somotimo prior to 1622, Edmund Gunter (15B1-1G26), Professor of Astron- 
omy at Gresham College, London (1639-1626), worked out computational proce- 
dures for solving problems such ns the following , of which he gives worked 
examples (among others) in his booh, Tho Cros-staff (6); 

10* Having the Latitude of the place, and the Declination of the Sun, 
to find the Azimuth {of the Sun)* {10, p* 267) 


13, Having the hour of the day, the Sun's Altitude, and the 
Declination, to fi:id the Azimuth* {10, p, 271] 

"Having thoso means to find the Sun's Azimuth, wo may compare it with tho 

»o f l? UC 4 1 t011aU UiC variation of the Hoodie" {10, p. 27d). 

Finding in this way a ••variation” ot only ”6 gr. 15 m, M , whereas Borough had 

a k,r \ 15 m# ln he 'Inquired after the pluco whore Mr. Borough 

observed, and wont to Lime hot* with OnmA %• 4 A n,t>i ic n i i'll t tho 


was 


. . - /’ 't — vm H v**vu a* iwi uw pruce wuun? «»* * 7 

observed, and wont to Limchouae with some friouds .** and towards night tho 
ul \ . , * UI, ° * do terminations of tho "variation" of the Needle 

Zo,-!;\h? Tl' tS *“ lul1 Jo,ail bo. P. 279J. Tho largest, "« gr. 13m,", 
!?*• 5 }^ HU thaa Borou Kh'« mean value, and nearly 5° loss than 

1 . * s s,!, “llt>3t valuo, 11 t;r. 11. 1/2 ni. Gun ter t>coms to have regarded 

, , as cast Inc dimbt on tho accuracy oi Borough '« findings, or his o»n; 

110 does not attempt t., arrive nt a 'bout' valuo for 1622 Irom his own Jala. 


Hom y On:) 11 In-. III. I, IVoIi h i„r .,f Astronomy at Gresham College, London, 
,.\’ WC * !i ,l " ! • M * n * <5 **'‘ ! *u»noy win* to t uno of the declination of » ,u ;' 

none needle a t a i-.lvou place, I • <>m a comparison of declination measm>i.m-tiia 
made at 1.1.., choice l.y William rioi-..ti,,ii l n 15HO, and at the samo plate l>V 

Liintor in * It Impel I in l»>:i|. Of Gunter's .l|.ht declination 

moaMu-.mieiil:; taken on .Inn.. 1H, 1 1.22 i lie largos! w..s (i® Ki‘ wl.jeli was nearly 

<H° 11 4 * > , the .lllfereme Lein; 

•iSi.iibod at first to om.ru In Borough's valuos. To resolve Ibis suspicion 



1633 


OoUibraml selects a particular sot of Borough's shadow amt needle roadln, s 
corresponding to 20 0 morning and afternoon apparont elevations of the Sun 

subjects these to dotallod astronomical a nalysTepand obtains 11° 0* 0" from* 
the morning data and 11* 32* 28” from the afternoon data, for comparison i ?h 
Borough s corresponding single declination valuo 11° 22* 30” Hie differ- 
onces ainonu these aro clearly fiefiligible, "So that if wo take tho Arithmct- 
icall moano, we may probably conclude tho variation [i.o. declination 1 
answerable to his timo to be about 11 gr. 1G min** [ 7, p. 15], Ills own 11 
determinations, made on Juno 12, 1634, rangod only from 3° 55* to 4° 12* so 
that "Those Concordant Observations can not produce a variation greater than 
4 gr. 12 min* nor loss than 3 gr , 55 min*, the Arithmetical! meane limiting 
it to 4 ;;r. and about 4 minutes" [ 7, p # 10], The exact arithmetic mean of 
his values is 4° 4 9/11*. 


Galileo appears to have been one of the first experimental scientists to 
recognise the need for attaining a degree of consistency among repeated meas- 
urements of a single quantity before the method of measurement concerned can 
be regarded as meaningful* Thus, describing his famous experiment on the 
acceleration of gravity in which he allowed a ball to roll different distan- 
ces down an inclined plane, Galileo wrote { 8, Third Day; Nat’l. odition, 
p, 213): 


M • * . wo lot, as I was saying, the ball descend through said 
channel, recording, in a manner presently to be described, the timo 
it took in traversing it all, repeating the same action many times 
to make really sure of the magnitude of time, in which one never 
found a difference of even a tenth of a puisobeat. Having done and 
established precisely such operation, we let the same ball descend 
only for the fourth part of the length of the same channel; 


I am grateful to my colleague Ugo Fano for this translation* 


16C8 v °l* 111 ° * tlie Philosophical Transactions contains "An Extract of a 

Letter, written by J . ft. to the Pu bli s he r " presenting a table of 5 magnetic 
declination measurements made near Bristol on June 13, 1666, by 
Capt. Samuel Sturmy , "an experienced Seaman, and a Commander of a Merchant 
Ship for many years," who took them "in the presence oi Mr* Staynred, an 
ancient Mathematician, and others." "In this Table , (Capt* Sturmy] notos the 
greatest ... difference to be 14 minutes; and so taking the mean for the true 
Variation , he concludes it then and there to be just 1. dog * 27, min,, viz * 
Juno 13, 1666. [ 9, p, 726"] The exact mean of the 5 reported values is 1° 

2*7,3 * ; the median-'the midrange»l° 24*. 

It is evident from the foregoing that taking tho arithmetic mean of repeated 
measurements of a single quantity as the best approximation to the true valuo of this 
quantity afforded by the measurements in hand had become an established, if not uni- 
versal, practice in the field of terrestrial magnetism before 1700. The practice 
spread to other fields, but not without some mixed cases to confuse the issue. For 
instance 

1737 oo Maupertuis, in his report of November 13, 1737 to the French Academy of 

Sciences on tho measurement of a meridian arc of one degree at tho Arctic 
Circle in Lapland, states (13, p. 422; 15, p. 36) that in measuring the 
angles involved "each one made his own observation and wrote it down 
separately, and afterwards we took the milieu [ middle , or mean ? ) of all of 
these observations, which differed little from each other • The raw obser- 
vations are not given here, nor in his full report [ 14, P* 431-466; 13, 

P. 791 f ] . Pinkerton [03. p. 240] and Clarke [55, p. 5] translated "milieu" 
as "mean", and Placket t [81, P. 133] cites Clarke’s rendition as unequivocal 
evidence of use oi the arithmetic mean by do Maupertuis, Unfortunately the 
available evidence is mixed. In his full report, Do Maupertuis gives 
[14, p. 435; 15, p. 36] three separate determinations of an angle PQM, 
namely, 28° 51* 23**, 23° 51* 30", 28° 32* 22", and then says "prenant un 
milieu entre toutes ces dcclinaisons , on a pour la declination dc Pullingi, 
ou 1* angle PQM, 28° 51* 52". Hero the "milieu" taken is evidently the 
middle in the sonse of the midrange! On the other hand, later in his full 
report, he ”ivcs {14, P. 447; two sets of 5 measurements each 



'Mont lo tail leu"* (i»( which Uio "milieu”] reported in ouch cusu is 
unquestionably ttui arithmo tic m oan . 

In Bareli 171)5, Tliomun Simpson wrote (2U, p. d2-U2] 

"that Uio method prncLisod l>y astronomers, in order to diminish the 
errors arising from Uio imperfections of instruments, and of Uio 
organs of sense, by taking iho Mean of sovornl observations, has not boon 
so ttoneraliy received, but that some persons, of considerable note 
havo boon of opinion, and even publickly maintained, that one single 
observation, taken with duo care, was as much to be relied on as the 
Moan of a brent number”, 

and then proceeded to provide a mathematical justification for practice, based on the 
mathematical theory of probability. More on this later. In 1757 ho romarked (22 
p. 6'i] that "the method practised by Astronomers ... [of J taking the mean of several 
observations. Is of very great utility, and almost universally followed,” Taken 
together theso statements seom to imply that the Principle of the Arithmetic Mean had 
become widely, but not universally, accepted by the middle of the 13th century^ — 


3* Moa3uroment o£ T* 0 or More Uelatod Quantities and Minimization «n Residu als 

When two or more rotated quantities are measured individually, the resulting meas- 
ured values usually fail to satisfy the constraints on their magnitudes implied by ihc 
given inter-relations among the quantities concerned. In such cases these "raw" meas- 
ured values are mutually contradictory and require adjustment in order to bo usable 
for the purpose intended. Thus, if one has an obaorved value for each of the three 
interior angles oi a plane triangle, these values are strictly speaking not usable as 
values for thoso angles unless they sum to ISO 0 , The primary goal 0 ; combination or 
adjustment of observations is to derive from such inconsistent measurements, IT 
possible, adjust ed values for the respective quantities concerned that do satisiy the 
constraints on their magnitudes imposed by the nature of the quantities themselves and 
by the existing interrelations among them. A second objective is to select iron a;.iong 
all possible sets of adjustod valuos, a set that is "best” in some well-defined sense, 

Inasmuch as the actual errors oi individual observations are usually unknown and 
forever unknowable, attention seems to havo been directed first to minimis in,, Lhe 
apparent inconsistency oi observations as evidenced by some simple function of their 
residuals.* 1 


If Y 1# Y a , Y r are observed values of a magnitude a then - a * # 

Y* “ a " - a * are the error s of the respective observations. If, the 

value oi a ho In,; unknown, one adopts somo particular value lor it, say a, then 
Y t - *i • Rj , Yj - a » R a , • • • i - a li^ are the residuals of the observations 
corresponding to the adjusto d value a. 


The successive stages of this approach to the general problem oi combination 4 »i obser- 
vations , which culminated in Legendre's 1800 pronouncement of his version of the 
Method of Least Squares, were as follows; 

£LuJL?jl2 When several 'equally good* measurements of a single quantity arc available, 
the Principle of the Arithmetic Moan states that the 'best* valuo to take 1* 
their arithmetic mean. The arithmetic mean a of a set oi measurements 
»»«i Iq I® the solution of the equation 

n 

X (Yj - a) - 0, (1) 

that Is, Uio value dotorminod by the condition oi zor«» sun .4 residuals. 1,1 
Iho language of physics since tho da yi^f A f efi ITnoIfe“s“T^T?-Tjr2“ B.C 7T7 
equation (1) says that a is the abscissa of the contor of gravity of n 
0« 1 1 magsos si timtoit «lr Absciss.*- Y r , Y, Y n ~ reS|S»c lively; 
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R oger Cotos (1632-1710), tho first Plunian Profossor of Astronomy ana 
Experimental Philosophy at Cambridge University and editor of the 2nd 
edition of New ton Princlpla , shows in his Aostimatio orrorum fll DU b- 

lishod posthumously in 1722, how the errors o f acasured ~ alues of various 
astronomical quantities of general Interest aro related to tho errors of the 
primary astronomical observations from which they are derived, For instance 
on p. 21, he considors tho practical problom of determining tho time (of day* 
or night) t 1 rom tho observed altitude h of some star and then show 3 how 
the error at in t rosultlng from an error dh in h depends function- 
ally on h, on tho observer’s latitude m, anlTon the an-le A between 
the observer’s meridian and the star's vortical circle. He then says 
(11, P. 22]: 

••In quite tho same way, in other cases, ono finds Limits of Error that 
derive their origin from the loss accurato observations, because tho 
Positions most suitable for Observing are inaccessible: so that to me 
hardly anything further seems to be desired once it has been shown by 
what argument ono can obtain maximum Probability in those circumstances, 
where diverse observations, arranged for the same purpose, exhibit 
conclusions only slightly different from each other. This, however, 
can be done in the way of the following Example. Let p be the loca- 
tion of some Object as determined from a first Observation, q, r, s the 
locations of tho same Object from subsequent Observations; let * 
furthermore P, Q, R, S be masses inversely proportional to the lengths 
of the Deviations over which one can spread the Errors arising from the 
single Observations, and which are determined by the given Error Limits; 
and at the points p, q, r, s let us imagine masses ?, Q, R, S and let 
us find their center of gravity 2: I say that the point 2 is the most 
probable Location of the Object, which can indeed be most safely 
assumed to be its place.'* 


I am grateful to my colleague Franz Alt for this translation from 
the original Latin, William Kruskal has drawn to .ny attention an 
£nglish°translation, by Augustus De Morgan, of the portion after the 
first colon [ 84, p. 379], 


We can express Cotes’ proposal in terms more familiar to us today as follows. 

If, for example, x and y aro two related variables, with y - F(x,p) # ' Jj®”, £ 

is a parameter of the relationship, so that p . say i nf* v 

to deduce a ’best 1 value for p from a number of observed values of y, say. 




to UUUUCC tt Uta L V41UU av* H — — 

Vi, Yo, Y„ corresponding to known fixed values of x, say 

then, according to Cotes' proposal, the 'best* value to take for p 
B of the equation 

n 


£ w 1 (B i - B) - 0, 


( 2 ) 


namely, tho weighted aritlimetlc mean 

»- h* — 

i-l 


°f the Individual 'observed' vulucs 

D i " * tx i* 


n). 


(3) 


(4) 


*fth weights 


Y^), 
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*i ~ d j ‘ 1 - K^) • *i\ 


(3) 


respectively, where AY £ Is* Uw* uncertainty oJ Y J , Unit iu, tho 
Xon;:Ui "over which one can spread the Urrors" to which is subject. This iu Uw 

essence of Cotes* proposal. 

Two worked examples will be hulpiui at this juncture: 

Example 1. Suppose that y - px, with x, y a 0, ana the pjobliMt is to dolor* 
mine tho constant oi proportionality To this end one would ordinarily choose 
values x., x a , x_ Tor x that wore as lar^e as possible, and then ohscivo uw 

correspond in^ values Y x , Y a , . . . , Y n of y. IT it were known that, over the ran* 
of x involved the uncertainty AY in a measured valuo of Y could be considered u 
be (at least approximately) constant, AY - c, then according; to Cotes* proposal the 
* bc 3 t * value to take for £ would bo the value B determined by the equation 


-1 


which may be writtei 


i m <%-•>•■ 

i-1 1 


i ( t i 

i-1 


( 6 ) 


(7) 


in which form it clearly oxprosses the condition of zoro sum oi residuals. Tho solu- 
tion of (7) is 


i v i 

, ,,, 

i-1 

tho ratio ol the arithmetic maaim , In this case, Cotes* * lino oi best ill*, y * 
passes through the or i j; in and the two-dimensional centor-of-; ravi ly of tho data, 

\X , Y / . 

.Sxamplo 2. Conditions and problem t.ho same as in Kxamplo 1, except U»al it is 
known that tho uncertainty AY^ oi' Y^ is dlroctly proportional to x^, the valuo oi 
x to which it corresponds. For AY - ex, Cotes' procedure implies that the 'bust' 
value to lako for 0 is tho solution oi tho equation 

l Cr - n ) - 0 * 

i-1 1 


(0) 


namely, 


IGJ) . 

»- ‘V--©. 


(10) 


tho arit iimoii c moan oi i hu 


Cotcjs* proposal doea nut • unora l J v,o 
do termini ni; 'host* values for all ol u»<> 
relationship. The next slop, the* o lore 
to multi-parameter problems, but rather * 
tho c °udi tion ol zero sum of residuals:- 


oaisjjy to provide a simple |»* or*diir»* ll,r 
pariiriietoru of a two or mui ti -paramo l or 
wa*i not a r.onoralizut Ion of Colon* procudui 
in ox tension to multi-parameter probb** 1 * 01 
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“as ra-assA-s rs“r.s«r - 

omor who maJe extraordinary contributions to evei-y b^aJeh o£ pCre and 
applied mathematics ot his time, and Tobias Uayer, a Gorman mathematician 

fl6^ l?t an a e^tension 0, of r ihe°a? *? havo Independently devised and allied’ 
ii today called residuals > that 

this consists oi subJlviding the obsefvItiinal’poJls intJ is minj sublets 
as there are confidents to be determined, the subdivision belnj^in terns 
of the valuc s °f (.one of) tho Independent varlable(s), and then applying 
the condition oi zero sum of residuals to the points of each subset. 


Proviuod that one is thus able to form at least as many distinct observational 
subsets ss there are unknown p&rsno ters to be determined, the Method of Averages will 
always come up with a value for each parameter, but there is some arbitrariness and 
roan for subjective choice in the formation of the subsets, with consequent effect on 
tho answers obtained. Thus, in the case of a two- parameter lino y - a ♦ £x, if the 

x’s are more or less equally spaced throughout their range, then it is customary to 

choose x 0 so that there are an equal number of x*s greater and less than x 0 ; but if 
the number of observational points is odd, the outcome will depend on whether the 
middlemost point is included in the left-hand or right-hand group. If tho x*s are not 

even symmetrically dispersed within their range, then the choice of the *best’ subdivi- 

sion becomes highly subjective, and the end rosults correspondingly "arbitrary". 


1757 Sometime between 1755 and 1757, Roger Joseph Boscovich, a Dalmatian Jesuit 

in the service of Pope Benedict XIV who made exceptional contributions in 
astronomy, geodesy, physics, and mathematics [77], formulated and 

applied tho principle that, given more than two pairs of observed values 
of variables x and y connected by a linear functional relationship of the 
form y - a + px, then the values (a and b) that one should adopt for a 
and g in order to obtain the line (y • a ♦ bx) that is most nearly in 
accord with all of the observations should be those determined jointly by 
the two conditions 


I. The sur.c* of the positive and negat ive residuals (to the y-values) 
s h a 1 1 be equal . 

II , Tho sum of (the absolute values of) all of the residuals, positive 
and negative, shall be as small as possible . 

Condition I states that a and b, the intercept and slope of the best 
fitting line y - a ♦ bx, must satisfy the equation 

n 

£ (V 4 - a - bxp - 0 , (11) 

1-1 

which can also be written in the form 

Y - a - bx - 0 . (12) 

In other words. Condition I states that the best fitting line y * a ♦ bx 
shall pass through Uio centroid (x, y) of the observational points. 
Condition II states that a and b must satisfy the equation 

jy^ - a - bXjJ ■ minimum (IT) 

i-1 

Replacing a in equation (15) by its value 

a - y - bx ( 14 > 

implied by equation (12), it is ««cn that Condi tion lit n c on June tion 
with Condition I roquiros that tho slope U shall satisiy the equation 
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ll 

y l&j - ?> - b(Xj - *>| ■ i#inia»ii;« . (la) 

1-1 

Conooquontly . .lotormlnn tl«n oi the ”0o3Covi«!li lino" covrospoudln. to u 
■Ivon sot ol obsorvatlonal points rc.hicos to Uo Uirninliv. Its sIojkj Ii 
irorn oquatlon (15) and then ovaluatini; a fro.ii equation (11). 

Doscovich soo.ns to havo evolved Ihoso criteria lor iotorminiig, u line oi 
best fit to observational data sometime bo tween 1755 and 1757, Near tho 
ond of his joint treatise with Malro (19) on do to r.ni nation ot the FI 'jure 
of the Earth from measurements of the lengths of meridian arcs und of 
seconds pendulums at different latitudes,*' published in 175a, Doscovich 


j/ If the "Figure of the Earth" is an oblate ellipsoid of revolutions, 
and L * L (to) denotes the length of a meridian arc of 1° (or of a seconds 
pendulum) at latitude ©, then 

1# «■ a + £ sin* tp, 

neglecting higher ordor terms in sin* co, with 


o being the ellipticity of the ellipsoid* 


examines (pp. 499-501) tho lengths of five meridian arcs measured at i lvc 
different latitudes, including the arc measured by Maire and himself In 
the vicinity of Romo, and finds that they yiold mutually Inconsistent 
values for tho ellipticity of the Earth when considered in pairs* (Sev- 
eral years earlier Euler [18] had noticed such mutual inconsistency of 
four meridian arc determinations, and dlsposod of the problem by applying 
arbitrary corrections to these determinations to make them consistent with 
Hewton's theoretical value for the Earth's ellipticity*) 


In hopes of inducing a satisfactory "best" value, ho first examined the fit 
of the line determined by the two oxtreme points, i.e. the length o l 1 
the equator (a • 0°) and at the Arctic Circle (A - 66°). In his judgment 
the other throe points lay farther from this line than could be accounted 
for by surveying errors* Next he tried the line through the equatorial 
point with slope equal to the arithmetic mean of the 10 individual "slopes 
yielded by taking the 5 points in all possible pairs.*' Some oi the 


* Had ho taken the welghtod arithmetic means 

B - Z* <y j - v< x j - - v* * 

and 

a * I {y i x j - y j - *i> a 


of the 10 individual "obsorved slopes" 


b u- 


1 Y d / 1 x i » (j - 1, 
l W lx i 


. . , 4; J - 1 ♦ 1 ...» S ) , 


and 10 Individual "observed intercepts” 

U x.l 


1J 


Y 1 *i| 


|Y j x j V 11 x j' 


(1-1, ..., i; J - i «■ 1, ...» 5), 


with wo i th is (x^ - x^)* , 
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whoro * 4 3 

l • i I 

1-1 j-1+1 


ho woulJ have ohtaincJ the loast squarus solution in dls;.ittso [u3, p. 231 J. 


roslJuals wore, ho iolt, still too lari.o, Notln, that tw.i w f the indi- 
vidual "slopes" wore very different Iran tho otlior d, ho rejected those an] 
tried the ari lh.no lie ncan of the ei ( ,ht. Tho rosultln> lino now lay above 
all but one of the points. Discouraged, ho put aside, for the tine bein 
tryin;. to ; et a 'bost' valuo for tho Earth's elllpticlty from those Jala.’ 
[19, 497-301 ). It is iu Boscovlch's 1757 summary (21J of this work 
that Boscovich states for tho first time his two conditions for a line of 
best fit, and reports tho result oi analyzinc the same five meridian-arc 
lengths by this method [21, pp. 391-392], In this iirsl 

pronouncement of his method he does not hive any justification for Condi- 
tions 1 and II nor any indication of how he solved equation (15) to obtain 
tho 'best* valuo of tho slope b. 

1760 In Sections 385-39C of a prose Supplement to the second volume of a throe- 

volum^ treatiso on Natural Philosophy in Latin hexameters by Benedict Stay 
[ 23 )-* 9 


■ ■ f — — 1 ■ — 

A Commenting on this treatise in 1873, Isaac Todhunter remarked [31, p. 322]: 
**Tho number of students interested both in Natural Philosophy and in Latin 
Verse could scarcely evor have been large; and is probably less now than 
formerly .** 


1783 


Boscovich restated his two conditions for determining the line of best lit 
to observational data, Justifying them as follows; the iirst he considered 
to be required by the traditional assumption that positive and negative 
errors are of equal probability; and tho second, to be necessary in order 
to bring the solution into closest possible agreement with the observations. 
He* then gave a very useful algorithm of his own invention for solving equa- 
tion (13) above, followed by a step-by-step Illustration of its application 
in terms ot the five meridian-arc lengths that he had considered previously. 
Todhunter has fittingly remarked [31, p. 332], "Boscovich’s exposition ol 
his method takes a icometrical form: it is simple, clear and instructive.** 
A French translation oi these sections is included in the Note appended to 
the French edition of his joint treatise with Malre [27, pp. 501-500). ills 
reasoning, in outline, may bo found in my chapter (75, p. 202-204] in the 
Boscovich Memorial Volume edited by L. L. Whyte, 


Laplace, in his first memoir on the Figure of tho Earth [32], found, like 
Boscovich, that the best available determinations of meridian arcs oi 1 at 
various latitudes gave contradictory results lor the elliptic! ty of the narth 
when considered in pairs. Ho, therefore proposed, and carried out a tost of 
the ellipsoidal hypothesis, that is, of whether the equation L « a4b sin « is 
capable oi represen ting tho obsorved data **wlthin the limits ot the errors ot 
observation**. This test consists of fitting to these data a line oi the 
foregoing, form with a and b chosen so as to minimize t he I «A HG£ S T D KV j ATION , 
and thon rr ' * 

is, or is 

and surveying measurement.* - - ... , , 

the line that minimizes the largest residual (which he described again Jn 
1799 [34, Dook III, Chapter 5, Section 39], and which can be read in tnuln:;. 
translation [48, pp. 417-424], Laplace applies it to his meridian aic length.,, 
and find3 that the largest resultinfc residual is on the border lino ol accept- 
ability, which loads him to suspoct tho ellipsoidal hypothesis. In this 
memoir he makes no nttompt to deduce 'best' valuos for a and , . n«. t. 
tho Earth's olliptlclty o (on tho assumption of elllpsoldality) . 


I; form with a and b chosen so as to m 1 n im 1 zo the MH^kST ..HivV * 

making a subjective Judgment whether tho resulting larr.es t residual 
5 not, explainable in terms of the uncertainties of the astronomical 
:?ylug measurements involved. After outlining a procedure for fin Jin 



1739 


Laplace in hlo second memoir on tho Figure of the Partis {7.1| f 

adopted Boscovich's two criteria ior a lino of bout l* it, and gave (pp, 37 . 
36) an algebraic formulation and derivation of Doscovich's algorithm f tir 
solvlug oquatiou (15) above, with tho following comments; "Busuovicb hau 
given for this purpose an ingenious method which is explained at thv usid 
of tho first {French) edition of his V oyage As tronomiquo et Gcographlmi- 
[27] but as its utilisation is complicated by tho need to consider 1 ~ 
geometrical figures, 1 am going to present it hero in a most simple *nal>u 
ical form.” Laplace's algobraic formulation of Boscovich's uigorittu, 
expressed In modern notation, is as follows; 


Consider only 
For each such 


thoso terms of tho summation (15) for which x.Ji 
term evaluate the corresponding implied slope 



(x A + x) . 


(16) 


Arrange theso in descending order of magnitude, thus 

b (l) * b (2) * b (3) * **• 4 b (*») ' (m * n) 


(17) 


Arrange the absolute values of their denominators in the corresponding 
order, thus 


'(1) 


- x 


\ 2 ) 


- xl 


I* 


(») 


(U) 


Compute the sum of all of the ''absolute denominators* 1 , 

m 

D “ I i x o) ■ *i • (10) 

Then the sum called for in (15) will be a minimum lor b * ^(y)* thc 

rth term of the sequonco (17), where r is the smallest integer for 
which the partial sum of tho first r terms of the sequence (18) equals 
or exceeds one~half of the sum of tho entire sequence, l.e,, the 
smallest r for which 

£ i*o> - * i d • <2o> 

j-i 

If in determining r tho inequality sign holds, then the solution 
b " unique; but if the equality sign holds, then the solution 

is not unlquo and (13) attains the same minimum tor any value oi b 
botweon and inclusive . 


fho Boscovich and Laplace a Igor i thins tor solving equation (15) are ot greater 
generality than may seora to be tho case at first sight. Their validity does not d«;pc*«‘ 
in any way upon the fact that the pivot point (x, y) is tho centroid oi the observa- 
tional points (x 1# y t ). Consequently, if it is desired to determine the coefficients 

* T the line y •* a + bx that pa uses through some particular point Ja 2 i? IKl 

Cftnd i 1 1 on II , then the requTred vHue of " tKe alope b can be ioutul directly 
from either Boscovich ' s or I«aplace's form ol tho algorithm, wTth x and y replaced by 
Xp and y 0 , respectively; and the corresponding value of a then found i rout the rela- 
tion a - y 0 - bXp . - 


1799 


VttPi ace . \ HI, Chapter 5 of his Itecanlfiuu Celeste ['111 published in 

17 J9, dlscus&os the figuro of the Ka r thin groaT’ do la 1 17 In Section .79, h° 
describes again {48, pp. 4X7-121) the method that he had used in 1783 (72 1 
dotermino the lino ol tho form L - a+b sin’rp that minimlaoH tho absolute 

maximum icsiduul, and then gives (48, pp. 42 1-471] an alternate 
achieving tho same end M whon the numbor of observations is 


value of tho 
procoduro for 
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1805 


considerable". Ho Logins Section 40 [ 48 , pp. 434 - 442 ] with tho remark; "Tho 
ollipslS/ determined In tho procoding article, sorvoa to ascertain whothor tho 
elliptical figure Is within tho errors ol observation; but it Uoes notdoter- 
mine, from the measured degrees, tho figure which seems most probable it 
appears to me, that this last ellipsis ought to satisfy the following ’condi- 
tions”. He then gives Boscovich's two conditions lor a lino of best fit— but 
without mention of Boscovlch!— and adds: "By considering. In this manner 
whole arcs, instead of degrees which have been doducod from them, we shall 
give to each of these degrees so much more Influence, in the computation of 
tho ellipticity of tho earth, as the corresponding arc is of greater extent 
which ought to bo tho caso". To this end he extends [48, pp. 438-442] his ' 
1789 algebraic formulation [ 33 ] of Boscovich's technique to the case of 
observational points of unequal weight, expressing Conditions I and II in 
torms of weighted residuals and appropriately modifying his own previous 
algebraic formulation and derivation of Boscovich's algorithm. In Section 41 
(48, pp. 443-468] he utilizes data for seven meridian arcs varying in length 
from 1.1° to 10.7°, and in latitude from 0.0 6 to 73.7° first to tost the 
ellipticity hypothesis by means of his procedure for minimization of the max- 
imum residual, and then to determine the "most probable" value of the earth's 
ellipticity. To this end he applies his (second) rain-max procedure to those 
data reduced to correspond to 1° arcs , the respective data points being taken 
to have equal weight , and finding the resulting minimum maximum residual of 
•f 48.60 double toises (i.e. 97.2 tolses • 189* meters) to be "exactly on the 
7... 1 tmTt of those which might be considered as possible", he reasons that 
"we must therefore admit, in the elliptical hypothesis, much greater (devia- 
tions] ... therefore it seems to follow — that the variations of the do. rees 
of the terrestrial meridian differ sensibly from the law — given by the 
hypothesis of elliptical meridians [48, p. 448]". Nevertheless, ho goes on 
to apply his extension of the Boscovlch procedure to these same data points 
weighted in proportion the lengths (in degrees) of tho arcs actually measure d, 
in order to find the ’host probable 1 ’ value of the Earth’s ellipticity TT i t is 
an ellipsoid — or, more accurately, the ellipticity of the best-fitting 
ellipsoid--and finds that the best-fitting ellipsoid Implies a residual of 
86.2(5 double toises (i.e., 172.52 toises - 336* meters) in tho Lapland degree 
( la t i tude 7 3\7 y j, wh ic h "is much too great to be admitted [and thus) confirms 
what we have said, that the earth varies sensibly from an elliptical figure". 
Bowditch, however, points out here [43, p. 450] that "an error of this magni- 
tude did— actually exist" — the length of the Lapland degree having boon found 


to be 200 toises less on remeasurement. 

i 


Adrien Marie Legendre (1732-1333), one of the outstanding French mathemati- 
cians at the end of the 13th, and start of the 19th centuries, says in the 
Preface to his book on "New Methods for Determining the Orbits of Comets" 
published in 1305 [35, p. viiij; 


"It is necessary then, when all of the conditions of the problem are 
expressed conveniently, to determine the coefficients in a manner that 
renders the errors as small as possible. For this effect, the method that 
appears to me the most simple and the most general consists in rendering a 
minimum the sum of the squares of the errors. One obtains thus as many 
equatTons as there are unknown coefficients, which serves to determine all 
of tho elements of the orbit. , • the method of which 1 have just spoken, 
and which I call tho Method of least squares, can be of great utility...". 


An application of this procedure to tho solution of three equations is given 
on p. 64, illustrating the now well-known rule for forming the so-called 
normal equations, which is stated explicitly in an Appendix "On the Method oi 
least Sq uaT^Tj 5 pp. 72-80], (An English translation of pages 72-75, by 
Professors Henry A* Rugcr and Helen M. Walker of Teacher's College, Columbia 
University, is given in David Eugene Smiths Source Book oi Mathematics (01, 
PP. 576-579]. In this Appendix, Legendre remarks; 

"Of all tho principles which can bo proposed for [ achieving an adjustment 
of observations such that] the extreme errors, positive or negative, 
shall bo confined within as narrow limits as possible ... theic is none 
more general, more exact, and more easy of application, than that ... whic 
consists of rendering tho sum of squares of tho erros a minimum . By this 
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moans thoro is established among tho errors a sort of equilibrium «l*ich 
preventing tho extremos from exorting an undiio influence, is very well * 
fitted to reveal that state of tho system which most nearly approaches the 
truth.” 

Unfortunately, Legendre, throughout his exposition of his "Method© des moimlros 
quarres", used the term ’’errors" for one or more accurately termed residuals , tills 
has served to confuse tho unwary and to conceal tho distinction be tween what l»o aorclj 
asserted In 1305 and what Gauss in 1821 [44] showed to bo a statistical property of the 
procedure. The essence of what he roally said is this: If in the interest of 
achieving an objective adjustment one seeks to minimize the mutual inconsistencies of 
the observations as measured by some simple function of their residuals , then tho 
practical requirements of general applicability, unique arithmetical solutions, and 
ease of computation lead to the adoption of the technique of Least Sum of Squared 
TIES 1 DUALS , No probability considerations wore involved. And his "discovery*’ simply 
marked the culmination of the attempts by Euler, Mayer, Boscovlch, Laplace and others, 
to develop a practicable objective method of adjustment based solely on consideration 
of residuals. 


4 • Probability Distributions of Errors and "Most Probable" Values 

The error of any measurement of a particular quantity is, by definition, the 
difference between the measurement concerned and the true value of the magnitude of 
this quantity, taken positive or negative according as the measure<nc n t is greater or 
less than the true value. In other words, if x denotes a single measurement of a 
quantity, or an adjusted value derived from a specific set of individual measurements, 
and 2 * s the true value of the magnitude of the quantity concerned, then, by lef ini* 
tion, 

tho error of x as a measurement of r m x-t 

Tho orror of any particular measurement, x, is, thorefore, a fixed number* The 
numerical magnitude and sign of this number wilT ordinarily bo unknown and unknowable, 
because the true value of the magnitude of the quantity concerned is ordinarily unlaw-i: 
and unknowable. A mathematical theory of errors is not possible so long as the errors 
of individual measurement are regarded as unique quantities associated with the partic- 
ular measurements concerned. A mathematical theory of errors is possiblo only when 
the error of a particular measurement is regarded as Instance of the errors character- 
istic of measurements of the same quantity that might have boon, or might be, yielded 
by the same measurement process undor the same conditions. Tills fundamental step was 
not taken until — 

1755 On March 4, 1755, Thomas Simpson (1710-1761), Professor of Mathematics at the 
Woolwich Military Academy, addressed "A Letter to the Bight Honourable 
George Earl of Macclesfield, President of the lloyal Society, on the advanta,,^ 
of taking the mean of a number of observations, in practical astronomy" [* ( M« 
This remarkable letter began as follows: 

"My lord, it is well known to your Lordship, that the method practise * 1 
by astronomers, in order to diminish the errors arising from the 
imperfections of Instruments, and of the organs of senso, by taking t lic 
Mean of sovorai observations, has not been so generally received, but 
that some persons, of considerable note, havo been of opinion, and ° von 
publickly maintained, that one single observation, taken with due care, 
was as much to be relied on as the Mean of a great number. 

As this appeared to me to bo a matter of much importance, i had a 
strong inclination to try whether, by tho application of mathematical 
principles, it might not receive some new light; from whence the 
utility and advantage of the method In practice might appear with a 
greater degree of evidence. In the prosecution of this design (the 
result of which I havo now tho honour to transmit to your Lordship) 
have. Indeed, beou obliged to make use of an hypothesis, or to ussumo 
a serlos of numbers, to express the respective chances for tho differ 
ent errors to which any slnglo observation is subjoct; ... 
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•Should not tho assumption, which I have rondo use of, appear to 
your Lor.hship so well chosen as S «,.« others Might he. It willV i 1( ,wev.-r 
bo sulficlcnt to answer tho intended purpose; and y«>„r Lordship will 
find, on <a lcula Lion that, wluitovor sorios is assumed lor tho chances 
ol' tho happening of the different errors, tho result will turn out 
greatly In favour of tho mothod now practisod, by taking a mean value." 


Simpson’s first '‘assumption” was that tho error s of measurements of a sin lo 
quantity by a particular measurement process bo regarded as talc in;; the values 

* v / -v+1, 2, 1, 0, 1, 2, » »4| v-1, v with equal probabilities, i.e, a 

discrete uniform distribution . Second, ho proposed that tho orrors be 
retarded as talc i nr; on iho above values with probablll ties proportional to 1,2, 

• v-1, v, V4-1, v, v-1, r» 2, 1, respectively, i.e. a discrete 

tr ia ngular distribut ion . Then, utilising the generating func ITon techniques 
employed extensively by Abraham i)e Moivre (1067-1754) and others in the 
solution of problems relating to tosses of Jico and other games of chanco, 
Simpson derived, for each of these assumptions, the probability distribution 
of the sum of n independent errors from such a distribution, and from this 
tho corresponding distribution of the arithmotic mean of n independent 
errors. He summed up his findings as follows: 


"Upon the whole of which it appears, that the taking of the 
Mean of a number of observations, greatly diminishes the chances 
for all the smaller errors, and cuts off almost all possibility 
of any great ones: which last consideration, alone, seems 
sufficient to recommend the use of the method, not only to 
astronomers, but to all others concerned in making of experiments 
of any kind (to which the above reasoning is equally applicable). 
And tho more observations or experiments there are made, the loss 
will tho conclusion be liable to err, provided they admit of 
being repeated under the same circumstances." {20, pp. 92-93] 


It should be noted that Simpson did not prove that "taking of the arithmotic mean" 
was the best thing to do but merely that it is cool. However, in accomplishing this 
goal he did something much more important: ho took tho bold step of regarding orrors, 
not as individual unrelated happeniogs, but as properties of the measurement process 
itself in conjunction with the instrument employed and the observer involved. Ho thus 
opened the way to a mathematical theory of measurement based on the mathematical theory 
of probability. 


17j7 


1774 


1774 


1778 


In a second paper on "the advantage of taking the mean" (22), Simpson finds 
the distribution of tho mean of n independent errors from a co ntinuous 
triangular distri bution, by proceeding to the limit as the number of 
possible error values In tho interval (-v, +v) tends to ini ini ty. 


Joseph Louis Lagrange (173G-1313), an Italian by birth, German by adoption 
and a Frenchman by choice, one of tho greatest mathematicians of all time, 
in a long memoiro "on the utility of taking tho mean" (23] provides a some- 
what more rigorous treatment of Simpson’s anal yscs--wl thou t mention ol 
Simpson- -a ml, by similar passage to the limit, derives tho distribution of 
tho mean of n independent errors from a continuous uniform distrlbu lion . 
Origin of the expression "law of facility of orror* r . 

Laplace in the first paper on the statistical theory of estimation (?9] 
proposes the Uou b io -ox pon on t ia 1 distribu t ion , m 0 -m|y|^ - » ^ y < * m 

a law of error, vhoro y ™ x-t, being the true value of the quantity of 
which x is a measurement. 


Janie 1 Bernoulli (1700-17H2) proposes semi-circular law of error , 

x n h 

t(x) - [a 3 - y 3 ] , - a < y < + a, and maximization of ^{a 3 ' 

»lth rospoot to t to obtain tho "roost probable’* value of t. For n-1! 
Ihl 3 ylolds T - x. For n > 3 lands to unmnnajioablo equations. 
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177J 

1795- 


1003 


iao9 


Laplaco propoaos X (x) - ^ -*<y< + a » as a law ot error. 


302 In 1795. at the ago of oightoon, Carl Friedrich Gaus3, mathematical poor »; 
Archimedes (337-212 B.C.) and Sir Isaac Newton (1642-1727), and unequally u 
mathematical precocity, discovered the advantages oi the technique of Li*M 
Sum oi SQUARED RESIDUALS for adjustment of observations in goodosy. 
•’Originally Gauss did not attach groat importance to the method of loast 
squares; ho felt it was so natural that it must have been used by many who 
were engaged in numerical calculations. Frequently ho said that he woula be 
willin', to bot that . the older Tobias Mayer [1723-1762] had used it in his 
calculations. Later ho discovered by examining Mayor’s papers that he would 
have lost the bot M . [72, p. 113]. In 1797 he concluded that determination 
of most probable values of observed quantities required knowledge ol tho law 
of error involved "[ t/erko , vol. IV, p. 98]. By June 1793 ho had complete! his 
now famous "first proof 4 * via the calculus of probabilities [72, p. 113]. on 
January 1, 1301, Guisoppi Plazzi at Palermo discovered a new planot, Cores, 
which ne was able to observe only until february 11th, alter which it was 
obscured by tho Sun. His attempts to compute its orbit from so few observa- 
tions were unsuccessful, and when ho looked for it, when it should have 
emerged from the Sun’s rays, he could not find it. Other European astronoaurs 
tried and also failed. The September 1801 issue of von Zach’s Monatllcho 
Corresponded gavoFiazzi’s complete observations, and so reached Gauss ; the 
October issue reported that all efforts to relocate tho planet had lulled. 
Gauss’s diary notes show that he was working on the Ceres problen from early 
November 1301 onward [72, p. 32], and before the end of ldOl Ceres was found 
again "at tho place predicted by him" which was "quite different from the 
former computations" [76, p. 353]. In April 1302, Heinrich Olbers found 
another new planet, Pallas, in the region where he had looked for Ceres [ 76 , 
p. 353). Gauss applied his orbit calculation techniques, including the 
method of least squares, to evaluate the orbit of Pallas also, and then wrote 
up his technique and findings in a "Summary survey of the methods applied in 
the determina tion of the orbits of both new planets", which he sent to Olbers 
on August G, 1302. Olbers did not return it to Gauss until November 1305 
[72, p. 33], i.e., not until after the appearance of Legendre’s proclamation 
of his own independent formulation of the "Methodo des raoindres quarres” iu 
his treatise on "New Methods for Determining the Orbits of Comets" (*35]. 
Thus, in newspaper parlance, was Gauss scooped! Following the publication o. 
Gauss's treatise on "Theory of tho Motion of Heavenly Bodies" in 1109, in 
which Gauss remarked [37, Book II, Sec. 3, Article 136] that "we have 
use of [this principle] since tho year 1795", Legendre became indignant an! 
accusod Gauss of appropriating his method of least squares. To help sat tho 
record straight, Gauss’s 1392 "Summary Survey" was published in its entirety 
in Zuch’s Monatliche Corresponded for September 1309. [33] 


Robert Adrian (1775-1843) of Philadelphia publishes 

- • < y < + -, 


£ (y) - ± o- h *y* 




two derivations of 


as a law of error, his second derivation being tho two-dimensional analog 
(in terms of errors ) oi Clark Maxwell's derivation or the tri-varlalo nor-ul 
distribution ox velocities in a gas. 


Gauss publishes his "first proof' of the Motliod oi Least Squares in Bool; Hi 
Section 3 oX his Theorla Motus {37], in evolving hi 3 "proof' he (a) adopt* 1 
“f* Populate the widely accoptod Principle of the Arithmetic Mean, 

' * *v | ixotf the concept that repetition ol’ a measurement process •■.oiusraiv* 

, H2g ? blll ty. distribution ot errors , and (c) applied Hayes’ method el 
lnvox-so probabili ty [git, zhj — without reference to Bayes. 5Ta7tTnTri ri ' -1 
theso premises ho showed that ii tho arithmetic mean ol n independent 

° r * ? ln ‘- le magnitude is to be tho most probable value ol this 
magnitude a posteriori, then the orrors Y. - x - r of ' tho Individual 
measurements x, must be distributed in a^cordinco with the law of error 


f (y) - JL 0 -»»V _ „ 


-.x/ 


< y < ♦ 
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V llonri Poincaro (1334-1912) pointed out in 1896 [57, pp. 152-1351 that 
Gauss's proof depends subtly on f(y) boing a "law of error" In the sense that 
the true value t is (to use modern terminology) the scalo parameter of the 
probability distribution of tho measurement x, say, c 

H **h^ (x^t) ® «— 

-(x) - — e * If x is moroly to bo the maximum likelihood csti- 

</tt 

mator of sorno parameter 0, then a broader class of distributions is 
involved [57, p. 155; 68, Example 17.01], 


Then he showed that, if errors are normally distributed, and if tho unknown 
values of the essential parameters have uniform a priori distributions, then 
tho most probable values of the unknown implied by a given bet of observa- 
tional data aro given Identically by the application of the technique of 
Least Sum of Squared RESIDUALS, The expression "most probable values’* had, 
and still has, a great popular appeal; and it is this, rather than Gauss's 
second approach, that is usually given as "the theoretical basis” of Least 
Squares in traditional books on the adjustment of observations and the theory 
of errors. In consequence. Gauss's first "proof” of the Method of Least 
Squares is the best known today among applied scientists. Gauss himself, 
however, in a letter to Bessel (1839) rejected this justification of tho 
Method of Least Squares on the grounds that, from a purely practical view- 
point, maximizing the probability of a zero error (of estimate) is less 
important than minimizing the probability of any large error (of estimate). 
This, his first paper on the subject remains notable, however, because in it 
he notes that Jb reflects the precision of the distribution of errors f(y), 
and gives his famous rules for weighting results of unequal precision so as 
to obtain final results of maximum attainable precision. 

1309 Laplace deduces [39] the Central Limit Theorem, consequently Gauss's Least 
Squares gives the "most probable value" under more general conditions when 
the number of observations Is large! 


5, Minimum Errors of Estimation 


1774 


1778 

1811 


Laplace suggests [29] that the "best mean" to take in practical 
astronomy is that function of the observations which has an equal probability 
of, over- and under-estimating the true value; shows that this is equivalent 
to adopting the principle of Least Mean ABSOLUTE ERROR of ^ ESTIMATION; and 
gives an algorithm for finding this particular function of three observa- 
tions in a one -parameter case. 


Laplace extends the foregoing approach to the case of n independent 
observations in the ono-paramotcr case [31], and terms this form of estimation 
the "Most Advantageous Method". 

Laplace shows [ 10 J among all distributions of the form «p(x) " Ke ». 

the normal distribution, for which *(x— r) a - h a (x-T> 2 , is tho only one for 
which x is tho "most advantageous" estimator of t. 


1821-23 By adopting instead the principle of Least Mean Squared ERROR of ESTIMATION 
and the requirement that the resulting "best moan should yield 
values of the quantities concerned if it should happen that all of the 
observations wero entirely free from error. Gauss shows (44, 4a j that, when 
the resulting "best values" aro linear functions of the observations then 
they are identically the same as those given by the technique of Least Sum 
of Squared RESIDUALS, which provides the practical modus operand! foi 
obtaining them. This fact, which mathematical statistician typ 
by saying that tho Method of Least Squares yields m^n|m^varU nc e 
linear unbiased estimators of the unknown magni tudos concei nou « y 

ge Horal conditions", is “ £5nsidorcd by many mathematical statisticians today 
to be tho real "theoretical basis" of the Method oi Lease Squares, and u 
explanation-^ tho robust survival and conspicuous utility of Least squuic 


1335 

— m 


1537 


( 2 ) 


1335-1511 

— m— 


1351 

— U) 


1602 

[5] 


1621 
15) 


1633 
[71 


1638 
(3) 


1668 

[ 9 ) 


1673 

TlO) 

1722 

— mi 


1723 

T12) 


1737 

T13) 


114] 

1738 
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as a tool of applied science. Nevertheless, this "best linear unbiased 
estimator" property oX beast Squares seems to be unknown to many usors 0 f 
the Method, Moreover, this ono-to-ono correspondence between mini*ut flj , 
some X unction oX RESIDUALS and minimising the same junction oX ERRORS of 
ESTIMATION appears to bo a unique property vl least squares % 
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